Average path length in random networks 
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Analytic solution for the average path length in a large class of random graphs is found. We 
apply the approach to classical random graphs of Erdos and Renyi {ER) and to scale-free networks 
of Barabasi and Albert {BA). In both cases our results confirm previous observations: small world 
behavior in classical random graphs Ier ~ In A'^ and ultra small world effect characterizing scale-free 
BA networks Iba ~ In A/ In In A. In the case of scale-free random graphs with power law degree 
distributions we observed the saturation of the average path length in the limit of A — > cxa for 
systems with the scaling exponent 2 < a < 3 and the small- world behaviour for systems with a > 3. 
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During the last few years random, evolving networks 
have become a very popular research domain among 
physicists P, S S 0| ■ A lot of efforts were put into in- 
vestigation of such systems, in order to recognize their 
structure and to analyze emerging complex properties. It 
was observed that despite network diversity, most of real 
web-like systems share three prominent features: small 
average path length (APL), high clustering and skewed 
degree distribution 0j B S S 13 • Several network topol- 
ogy generators have been proposed to embody the fun- 
damental network characteristics 0, IE ■ Due to ex- 
tensive numerical simulations there were created and an- 
alyzed realistic models of real networks especially basing 
on preferential attachment rule introduced by Barabasi 
and Albert 0, S ■ The most basic issues within the 
scope of network investigation are structural: connec- 
tivity distributions 8|. lil |lO|. correlation analyzes (in- 
cluding clustering) llL Ha. and finally estimations 



of the APL [U, Ha Ha llTl . The last characteristics 
is of great importance for network studies as it delivers 
basic information on a type of network geometry. It is 
clear, that a better understanding of network topology 
is of great importance for modern network designing and 
indirectly affects such crucial fields like information pro- 
cessing in different communication systems (including the 
Internet) [IlIllHM, disease or rumor transmission 
in social networks |22[l23l and network optimization 
HE EE HEl' these processes become more efficient 
when the mean distance between network sites is smaller. 

It is well known that random networks such as Erdos 
and Renyi {ER) graphs, as well as partially random 
networks such as Watts-Strogatz small-world models 
^EE^ I have a very small APL, which scales as / ^ In iV, 
where N describes the network size. In fact, it was ex- 
pected that the logarithmic size effect on the APL is a 
common property of random networks [l^ . Very recently 
Cohen and Havlin found that random networks with 
power-law degree distribution P{k) ~ A;~" and the scal- 
ing exponent 2 < a < 3 exhibit anomalous scaling of 
the average distance I ^ InlnA^. Such an anomalous 



scaling is expected to lead to anomalies in diffusion and 
transport phenomena within the networks. The result 
is particularly interesting since it is known that most of 
real networks, including both manmade communication 
networks like the Internet and natural networks like food 
or metabolic networks, exhibit scale-free character with 
the relevant scaling exponents 0, Q. IE Q • 

The paper presents an analytic theory describing met- 
ric features of random networks. It allows to calculate 
the main network characteristics like: APL, intervertex 
distance distribution and the mean number of vertices at 
a certain distance away from a randomly chosen vertex. 
We compare our analytic results with numerical simula- 
tions performed for ER random graphs and for scale-free 
Barabasi and Albert {BA) networks. 

Let us start with the following lemma. 

Lemma 1 // Ai, A2, . . . , An are mutually independent 
events and their probabilities fulfill relations ViP(Ai) < e 
then 



P(|J Ai) = 1 - exp(- ^ P(AO) - 0, 



(1) 



i=l 



where < Q < E"=o - (1 + ■ 

Proof. Using the method of inclusion and exclusion |2i 
we get 



with 
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p{\]A.^ = Y.^-\y^^s{i), 



P{A,)P{A,)...P{A^) 

l<ii <i2<...<ij <n 



(2) 



(3) 



where < Qj < {n-' /j\ — (")) . The term in bracket 
represents the total number of redundant components oc- 
curring in the last line of Neglecting Qj it is easy to 
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see that (1— P(UAi)) corresponds to the first (n+1) terms 
in the MacLaurin expansion of exp(— J2 The ef- 

fect of higher-order terms in this expansion is smaUer 
than R < {ne)"'+'^/in +1)1. It follows that the total er- 
ror of (P) may be estimated as Q < X]J=i Qj + ^- This 
completes the proof. 

Let us notice that the terms Qj in ^ disappear when 
one approximates multiple sums X)i<i;i<j2<...<i,<n 

by 

corresponding multiple integrals. For e — A/n ^ 1 the 
error of the above assessment is less then A^exp(A)/n 
and may be dropped in the limit n ^ oo. 

A random graph with a given deg ree distribution P{k) 
is the simplest network model In such a network 

the total number of vertices N is fixed. Degrees of 
all vertices are independent identically distributed ran- 
dom integers drawn from a specified distribution P{k) 
and there are no vertex- vertex correlations. Because 
of the lack of correlations the probability that there 
exists a walk of length x crossing index-linked ver- 
tices {i,vi,V2 . . ■V(^x_i), j} is described by the product 

PiVl PviV2\iVl Pv2V3\viV2 ' ' ' Pv{x -2)^(a: 



where 



P^j 



(4) 



gives a connection probability between vertices i and j 
with degrees ki and kj respectively, whereas 



Pij\U 



{hi 

{k)N 



(5) 



describes the conditional probability of a link {i, j} given 
that there exists another link {l^i}. It is important to 
stress that the graph theory distinguishes a walk from a 
path 'so]. A walk is just a sequence of vertices. The only 
condition for such a sequence is that two successive nodes 
are the nearest neighbors. A walk is termed a path if all 
of its vertices are distinct. In fact we are interested in 
the shortest paths. Let us consider the situation when 
there exists at least one walk of the length x between 
the vertices i and j. If the walk(s) is(are) the shortest 
path(s) i and j are exactly x-th neighbors otherwise they 
are closer neighbors. In terms of statistical ensemble of 
random graphs "sTl the probability pij {x) of at least one 
walk of the length x between i and j expresses also the 
probability that these nodes are neighbors of order not 
higher than x. Thus, the probability that i and j are 
exactly x-ih. neighbors is given by the difference 



Plj{x) =pij{x)-p^j{x- 1). 



(6) 



In order to write the formula for Pij{x) we take advan- 
tage of the lemma Q 



Pij{x) = 1 - Q - 

N N N 

t)l = lU2 = l 'U(x-l) = l 



(7) 



where N is the total number of vertices in a network. 
A sequence of (x + 1) vertices {i,vi,V2 . . ■ , V(x-i)Tj} be- 
ginning with i and ending with j corresponds to a sin- 
gle event Ai and the number of such events is given by 
n — N^~^. Putting into O and replacing the sum- 
ming over nodes indexes by the summing over the degree 
distribution P{k) one gets: 



Pij{x) = 1- exp 



k,kj {k{k - 1))^- 



N 



Q- (8) 



The assumption underlying is the mutual indepen- 
dence of all contributing events Ai. In fact, since the 
same edge may participate in several a:;— walks there ex- 
ist correlations between these events. Nevertheless, it is 
easy to see that the fraction of correlated walks is negli- 
gible for short walks {x ^ N) that play the major role 
in random graphs showing small- world behavior. 

The question is when the term Q in may be ne- 
glected. To work out the problem let us perform the 
following reasoning: if V(i^j) there exists e <C 1 such that 

Pij — ^ then '^x>l Pivi Pv\V2\iv\ • ■ ■ 7^t'(x-i)i|t'(x-2)^(x-l) — 

e"^ ^ 1 and Q may be ignored. In fact, due to Q the 
condition pij <^ 1 is not fulfilled for pairs of vertices i 
and j possessing large degrees ki and kj . The fraction of 
such pairs may be estimated as 



Pik-i) I P{h)dhdkj < 1. (9) 

Je(k)N/kj 

Using the Chebyshev's inequality [i^ and solving © 
with respect to e <C 1 one gets the condition when Q 
may be dropped 



{ky 



■((fc2)_(fc)2)«Ar2 



(10) 



Due to (0 the probability that both vertices are ex- 
actly the x-th neighbors may be written as 



pUx)^F{x-1)-F{x), 



where 



F(x) — exp 



hk, ((fc2) - {k)r 

N {ky 



(11) 



(12) 



Note that averaging Hll|) over all pairs of vertices one 
may obtain the intervertex distance distribution p{x) = 
{{Pij{x)) i) j . Now the mean number of vertices at a cer- 
tain distance x away from a randomly chosen vertex i 
can be written as Zx = J p*j{x)P{kj)Ndkj. Taking only 
the first two terms of power series expansion of both ex- 
ponential functions in (|llll one gets the relationship ob- 
tained by Newman et al. 16] = {Z2/ zi)^^^ zi that was 
received assuming a tree-like structure of random graphs. 
The expectation value for the APL between i and j is 



lij {ki , kj 



E 



xp*Jx) 



x=0 



F{x). 



(13) 



3 



Notice that a walk may cross the same node several times 
thus the largest possible walk length can be a; = oo. The 
Poisson summation formula allows us to simplify H13|) 



Inhkj +ln((fc2) - (fc)) +lniV-7 



ln((fc2)/(fc) - 1) 



1 

2' 

(14) 

where 7 ~ 0.5772 is the Euler's constant. The average 
intervertex distance for the whole network depends on a 
specified degree distribution P(k) 



ln((fc2) - (fc)) - 2(lnfc) +lniV- 
ln((fc2)/(fc) - 1) 



2 



(15) 



The formulas ifT^ and ((T5|l diverge when (fc^) — 2(fc), 
giving the well-known estimation of percolation threshold 
in undirected random graphs [s^ ^3 . 

To test the formula H15|l we start with two well known 
networks: ER classical random graphs and scale-free BA 
networks. The choice of these two networks is not acci- 
dental. Both models play an important role in the net- 
work science 0, 12 S • The ER model was historically 
the first one but it has been realized it is too random 
to describe real networks. The most striking discrep- 
ancy between ER model and real networks appears when 
comparing degree distributions. As mention at the begin- 
ning of the paper degree distribution follows power-law in 
most of real systems, whereas classical random graphs ex- 
hibit Poisson degree distribution. The only known mech- 
anism driving real networks into scale-free structures is 
preferential attachment. The simplest model that incor- 
porates the rule of preferential attachment was originally 
introduced by Barabasi and Albert ^ . 

Classical ER random graphs. For these networks 
the degree distribution is given by the Poisson function 
P(fc) = e~^'''^ (k)'' /kl and the condition l(T?Hl is always 
fulfilled. However, since (In k) cannot be calculated ana- 
lytically for Poisson distribution thus the APL may not 
be directly obtained from (|15|l . To overcome this prob- 
lem we take advantage of the mean field approximation. 
Let us assume that all vertices within a graph possess 
the same degree Vi ki — (fc). It implies that the APL 
between two arbitrary nodes i and j H15|l should describe 
the average intervertex distance of the whole network 
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1 



In(pA^) 



(16) 



Until now only a rough estimation of the quantity has 
been known. One has expected that the average shortest 
path length of the whole ER graph scales with the num- 
ber of nodes in the same way as the network diameter. 
We remind that the diameter d of a graph is defined as 
the maximal distance between any pair of vertices and 
dER = liiN/\ii{pN) 0,0. FigH shows the prediction 
of the equation (|16|l in comparison with the numerically 
calculated APL in classical random graphs. 
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FIG. 1: The average path length Ier versus network size A'^ in 
ER classical random graphs with (k) — pN = 4, 10, 20. The 
solid curves represent numerical prediction of Eg. 1161 . 



Scale-free BA networks. The basis of the BA model 
is its construction procedure. Two important ingredi- 
ents of the procedure are: continuous network growth 
and preferential attachment. The network starts to grow 
from an initial cluster of m fully connected vertices. Each 
new node that is added to the network creates m links 
that connect it to previously added nodes. The prefer- 
ential attachment means that the probability of a new 
link growing out of a vertex i and ending up in a vertex 
j is given by pfj"^ = mkj{ti)/J2i h{U), where kj{ti) Q 
denotes the connectivity of a node j at the time when 
a new node i is added to the network. Taking into ac- 
count the time evolution of node degrees in BA networks 
one can show that the probability pfj^ is equivalent to 
Q. Now let us consider the conditional probability 
Checking the possible time order of the vertices J, j, ^ it 
is easy to see that in five of 3! cases Pij\u — Pij and in a 
good approximation we get instead of ^ the result 



pf'^(x) = 1-exp 



kj^ kj i^k 



N {kY 



(17) 



It was found 34] that the degree distribution in BA 
network is given by P{k) — 2vn?k~", where k — m,m + 
1, . . . ,mVN, and the scaling exponent a = 3. Putting 
(k) — 2m, (A;2) = In A^ and taking into account 1)17(1 
one gets that the APL between i and j is given by 



,BA(, , ^ _ -ln(fc,%)-HniV + ln(2m)-7 3 
^'"'^''J)- lnlnA^ + ln(m/2) ^2' 



Averaging (|18() over all vertices we obtain 



Iba — 



lnAf-ln(m/2)-l-7 
lnlnA^ + ln(m/2) 



(18) 



(19) 



FigEl shows the APL of BA networks as a function of 
the network size N compared with the analytical formula 
()19|l . There is a visible discrepancy between the theory 
and numerical results when (fc) = 4. The discrepancy 
disappears when the network becomes denser i.e. when 
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FIG. 2: Characteristic path length Iba versus network size A'^ 
in BA networks. Sohd lines represent Eg. (1191 . 

(/c) increases. We suspect that it is the effect of struc- 
tural correlations that occur within the evolving networks 
[ill Issf and are absent in random graphs used for ana- 
lytic calculations. The results let us deduce that the 
correlations become less important in denser networks. 

Scale-free networks with arbitrary scaling exponent. 
Let us consider scale-free random graphs with degree 
distribution given by a power law, i.e. Pa{k) = {a — 
l)m"-ifc-", where k = m,m + 1, . . . ,mN^/^°'-^'> 
Solving fTn|l for Pa{k) one can see that our approach 
should work for a > 2. Taking advantage of p5|l we get 
that for large networks ^ 1 the APL scales as follows 

• ? ~ 2/(3- a) + 1/2 for 2 < a < 3, 

• ? ~ ln7V/lnlniV + 3/2 for a = 3, 

• I ~ ln7V/(ln(m(a-2)/(a-3)-l) + l/2 for a > 3. 

The result for a > 3 is consistent with estimations ob- 
tained by Cohen and Havlin The first case with I 
independent on TV shows that there is a saturation effect 
for the mean path length in large networks. Note, that 
the effect was observed in metabolic networks [36]. 

In conclusion, we presented a theory for metric prop- 
erties of random networks with arbitrary degree distri- 
bution. The approach is applied to get an analytic for- 
mula for the APL in a large class of undirected random 
graphs with an arbitrary degree distribution P{k). The 
results are in a very good agreement with numerical simu- 
lations performed for ER random graphs and for BA net- 
works. We observed saturation of / in the limit N oo 
of scale-free networks with scaling exponents from the 
range 2 < a < 3, the small- world behaviour for networks 
with a > 3 and the ultra small-world behaviour of BA 
model. Our derivations show that the behaviour of APL 
within scale-free networks is even more intriguring than 
reported in the recent paper of Cohen and Havlin 17] . 

Appendix. After finishing the paper we learned about 
the preprint on this subject written by Dorogovtsev, 
Mendes and Samukhin [33. Basing on generating func- 



tion formalism the authors derived a similar formula for 
the APL in random graphs I ^ \n N / ln{{k'^) / (k) — 1). 
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